Search CORE

DigitalCommons@Kennesaw State University

A Phylogenomic Assessment of Ancient Polyploidy and Genome Evolution Across the Poales

Author: al. Et
McKain Michael R.
McNeal Joel R.
Tang Haibao
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 07/03/2016
Field of study

Comparisons of flowering plant genomes reveal multiple rounds of ancient polyploidy characterized by large intra-genomic syntenic blocks. Three such whole genome duplication (WGD) events, designated as rho (ρ), sigma (σ), and tau (τ), have been identified in the genomes of cereal grasses. Precise dating of these WGD events is necessary to investigate how they have influenced diversification rates, evolutionary innovations, and genomic characteristics such as the GC profile of protein coding sequences. The timing of these events has remained uncertain due to the paucity of monocot genome sequence data outside the grass family (Poaceae). Phylogenomic analysis of protein coding genes from sequenced genomes and transcriptome assemblies from 35 species, including representatives of all families within the Poales, has resolved the timing ofrho and sigma relative to speciation events and placed tau prior to divergence of Asparagales and the commelinids but after divergence with eudicots. Examination of gene family phylogenies indicates that rhooccurred just prior to the diversification of Poaceae and sigma occurred before early diversification of Poales lineages but after the Poales-commelinid split. Additional lineage specific WGD events were identified on the basis of the transcriptome data. Gene families exhibiting high GC content are underrepresented among those with duplicate genes that persisted following these genome duplications. However, genome duplications had little overall influence on lineage-specific changes in the GC content of coding genes. Improved resolution of the timing of WGD events in monocot history provides evidence for the influence of polyploidization on functional evolution and species diversification

SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand

Author: Bomhoff Matthew D.
Briones Evan
Lyons Eric
Schnable James C.
Tang Haibao
Zhang Liangsheng
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 11/11/2015
Field of study

The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, evenwhennosuchgeneispresent.Thiscapabilitymeansthatsynteny-basedmethodsarefarmoreeffectivethansequencesimilaritybased methods in identifying true-negatives, a necessity forstudying gene loss and gene transposition. However, the identification of syntenicregionsrequirescomplexanalyseswhichmustberepeatedforpairwisecomparisonsbetweenanytwospecies.Therefore,as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of targetgenomes.SynFindiscapableofreportingper-geneinformation,usefulforresearchersstudyingspecificgenefamilies,aswellas genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc

Springer - Publisher Connector

Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses

Author: Bowers John E
Gowik Udo
Paterson Andrew H
Tang Haibao
Wang Xiyin
Westhoff Peter
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Comparison of the sorghum, maize and rice genomes shows that gene duplication and functional innovation is common to evolution of most but not all genes in the C4 photosynthetic pathwa

SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand

Author: Bomhoff Matthew D.
Briones Evan
Lyons Eric
Schnable James C.
Tang Haibao
Zhang Liangsheng
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 11/11/2015
Field of study

Genotype-Corrector: improved genotype calls for genetic mapping in F2 and RIL populations

Author: Fang Jingping
Li Delin
Liang Pingping
Miao Chenyong
Schnable James C.
Tang Haibao
Yang Jinliang
Zhang Xingtan
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2018
Field of study

F2 and recombinant inbred lines (RILs) populations are very commonly used in plant genetic mapping studies. Although genome-wide genetic markers like single nucleotide polymorphisms (SNPs) can be readily identified by a wide array of methods, accurate genotype calling remains challenging, especially for heterozygous loci and missing data due to low sequencing coverage per individual. Therefore, we developed Genotype-Corrector, a program that corrects genotype calls and imputes missing data to improve the accuracy of genetic mapping. Genotype-Corrector can be applied in a wide variety of genetic mapping studies that are based on low coverage whole genome sequencing (WGS) or Genotyping-by-Sequencing (GBS) related techniques. Our results show that Genotype-Corrector achieves high accuracy when applied to both synthetic and real genotype data. Compared with using raw or only imputed genotype calls, the linkage groups built by corrected genotype data show much less noise and significant distortions can be corrected. Additionally, Genotype-Corrector compares favorably to the popular imputation software LinkImpute and Beagle in both F2 and RIL populations

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Microcollinearity between autopolyploid sugarcane and diploid sorghum genomes

Author: Bowers John
Chen Cuixia
Hudson Matthew E
Macmil Simone
Ming Ray
Moose Stephen P
Murray Jan E
Najar Fares
Paterson Andrew H
Roe Bruce
Rokhsar Daniel S
Tang Haibao
Van Sluys Marie-Anne
Wang Jianping
Wiley Graham
Yu Qingyi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract\ud \ud \ud \ud Background\ud \ud Sugarcane (Saccharum spp.) has become an increasingly important crop for its leading role in biofuel production. The high sugar content species S. officinarum is an octoploid without known diploid or tetraploid progenitors. Commercial sugarcane cultivars are hybrids between S. officinarum and wild species S. spontaneum with ploidy at ~12×. The complex autopolyploid sugarcane genome has not been characterized at the DNA sequence level.\ud \ud \ud \ud Results\ud \ud The microsynteny between sugarcane and sorghum was assessed by comparing 454 pyrosequences of 20 sugarcane bacterial artificial chromosomes (BACs) with sorghum sequences. These 20 BACs were selected by hybridization of 1961 single copy sorghum overgo probes to the sugarcane BAC library with one sugarcane BAC corresponding to each of the 20 sorghum chromosome arms. The genic regions of the sugarcane BACs shared an average of 95.2% sequence identity with sorghum, and the sorghum genome was used as a template to order sequence contigs covering 78.2% of the 20 BAC sequences. About 53.1% of the sugarcane BAC sequences are aligned with sorghum sequence. The unaligned regions contain non-coding and repetitive sequences. Within the aligned sequences, 209 genes were annotated in sugarcane and 202 in sorghum. Seventeen genes appeared to be sugarcane-specific and all validated by sugarcane ESTs, while 12 appeared sorghum-specific but only one validated by sorghum ESTs. Twelve of the 17 sugarcane-specific genes have no match in the non-redundant protein database in GenBank, perhaps encoding proteins for sugarcane-specific processes. The sorghum orthologous regions appeared to have expanded relative to sugarcane, mostly by the increase of retrotransposons.\ud \ud \ud \ud Conclusions\ud \ud The sugarcane and sorghum genomes are mostly collinear in the genic regions, and the sorghum genome can be used as a template for assembling much of the genic DNA of the autopolyploid sugarcane genome. The comparable gene density between sugarcane BACs and corresponding sorghum sequences defied the notion that polyploidy species might have faster pace of gene loss due to the redundancy of multiple alleles at each locus.We acknowledge our colleagues at the University of Oklahomas Advanced Center for Genome Technology, Chunmei Qu and Ping Wang for their assistance with 454 GSFLX sequencing sample preparation and Steve Kenton for his help with deconvoluting the pooled BACs and their subsequent assembly. We also thank Eric Tang for assistance on sequencing two BACs using Sanger sequencers. This project is supported by startup funds from the University of Illinois to RM and a grant from the Energy Bioscience Institute (EBI) to SPM, MEH, RM, and DSR.We acknowledge our colleagues at the University of Oklahoma's Advanced Center for Genome Technology, Chunmei Qu and Ping Wang for their assistance with 454 GS-FLX sequencing sample preparation and Steve Kenton for his help with deconvoluting the pooled BACs and their subsequent assembly. We also thank Eric Tang for assistance on sequencing two BACs using Sanger sequencers. This project is supported by start-up funds from the University of Illinois to RM and a grant from the Energy Bioscience Institute (EBI) to SPM, MEH, RM, and DSR

Springer - Publisher Connector

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Screening synteny blocks in pairwise genome comparisons through integer programming

Author: Andrew H Paterson
BJ Haas
Brent Pedersen
C Simillion
C Simillion
C Soderlund
E Lyons
E Lyons
Eric Lyons
G Tesler
H Tang
H Tang
Haibao Tang
HW Six
James C Schnable
JE Bowers
JM Aury
JM Catchen
K Yogeeswaran
L Cui
M Kellis
Michael Freeling
O Jaillon
O Jaillon
P Pevzner
Q Peng
R Warren
RM Karp
S Schwartz
SF Altschul
W Miller
WJ Kent
X Wang
Y Van de Peer
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events. Results We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota-based screening can eliminate ambiguous synteny blocks and focus on specific genomic evolutionary events, like the divergence of lineages (in cross-species comparisons) and the most recent WGD (in self comparisons). Conclusions The QUOTA-ALIGN algorithm screens a set of synteny blocks to retain only those compatible with a user specified ploidy relationship between two genomes. These blocks, in turn, may be used for additional downstream analyses such as identifying true orthologous regions in interspecific comparisons. There are two major contributions of QUOTA-ALIGN: 1) reducing the block screening task to a BIP problem, which is novel; 2) providing an efficient software pipeline starting from all-against-all BLAST to the screened synteny blocks with dot plot visualizations. Python codes and full documentations are publicly available <url>http://github.com/tanghaibao/quota-alignment</url>. QUOTA-ALIGN program is also integrated as a major component in SynMap <url>http://genomevolution.com/CoGe/SynMap.pl</url>, offering easier access to thousands of genomes for non-programmers.</p

Springer - Publisher Connector

Directory of Open Access Journals

The University of Arizona

eScholarship - University of California

QUBIC: a qualitative biclustering algorithm for analyses of gene expression data

Author: Aguilar-Ruiz
Andrew H. Paterson
Armstrong
Ashburner
Barkow
Ben-dor
Bryan
Bryan
Bryan
Carmona-Saez
Castillo-Davis
Cheng
Eisen
Faith
Gasch
Getz
Golub
Guojun Li
Haibao Tang
Hartigan
Huttenhower
Ihmels
Kanehisa
Keseler
Kluger
Kung
Li
Liu
Madeira
McLachlan
Morgan
Murali
Prelic
Qin Ma
Reiss
Ruepp
Shamir
Tanay
Xu
Yeung
Ying Xu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Biclustering extends the traditional clustering techniques by attempting to find (all) subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. Still the real power of this clustering strategy is yet to be fully realized due to the lack of effective and efficient algorithms for reliably solving the general biclustering problem. We report a QUalitative BIClustering algorithm (QUBIC) that can solve the biclustering problem in a more general form, compared to existing algorithms, through employing a combination of qualitative (or semi-quantitative) measures of gene expression data and a combinatorial optimization technique. One key unique feature of the QUBIC algorithm is that it can identify all statistically significant biclusters including biclusters with the so-called ‘scaling patterns’, a problem considered to be rather challenging; another key unique feature is that the algorithm solves such general biclustering problems very efficiently, capable of solving biclustering problems with tens of thousands of genes under up to thousands of conditions in a few minutes of the CPU time on a desktop computer. We have demonstrated a considerably improved biclustering performance by our algorithm compared to the existing algorithms on various benchmark sets and data sets of our own. QUBIC was written in ANSI C and tested using GCC (version 4.1.2) on Linux. Its source code is available at: http://csbl.bmb.uga.edu/∼maqin/bicluster. A server version of QUBIC is also available upon request

CiteSeerX

Digital Repository @ Iowa State University (ISU)

A draft physical map of a D-genome cotton species (Gossypium raimondii)

Author: Bowers John E
Braidotti Michele
Chen Amy H
Chicola Kristen
Collura Kristi
Compton Rosana O
Epps Ethan
Estill James C
Golser Wolfgang
Grover Corrinne
Ingles Jennifer
Karunakaran Santhosh
Kim Changsoo
Kudrna Dave
Lemke Cornelia
Lin Lifeng
Olive Jaime
Paterson Andrew H
Peterson Daniel G
Pierce Gary J
Rainville Lisa K
Rong Junkang
Tabassum Nabila
Tang Haibao
Um Eareana
ur Rahman Mehboob
Wang Xiyin
Wendel Jonathan F
Wing Rod A
Wissotski Marina
Yu Yeisoo
Zuccolo Andrea
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, <it>Gossypium hirsutum </it>and <it>G. barbadense</it>, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor <it>Gossypium raimondii </it>for complete sequencing. Results A whole genome physical map of <it>G. raimondii</it>, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to <it>Arabidopsis thaliana </it>(AT) and <it>Vitis vinifera </it>(VV) whole genome sequences. Conclusion Several lines of evidence suggest that the <it>G. raimondii </it>genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the <it>Arabidopsis </it>and <it>Vitis vinifera </it>genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence.</p

Directory of Open Access Journals